Designing and Implementation of "regional Crawler" as a New Strategy for Crawling the Web

نویسندگان

Milad Shokouhi

Pirooz Chubak

Farhad Oroumchian

Hassan Bashiri

چکیده

By the rapid growth of the World Wide Web, the significance and popularity of search engines are increasing day by day. However, today web crawlers are unable to update their search engine indexes concurrent to the growth in the information available on the web. This sometimes causes users to be unable to search on recent or updated information. Regional Crawler that we are proposing in this paper, improves the problem of updating and finding new pages to some extent by gathering users’ common needs and interests in a certain domain, which can be as small as a LAN in a department of a university or as huge as a country. In this paper, we introduce the design of the Regional Crawler architecture and discuss its application in search engines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

A Novel Method for Crawler in Domain-specific Search

A focused crawler is a Web crawler aiming to search and retrieve Web pages from the World Wide Web, which are related to a domain-specific topic. Rather than downloading all accessible Web pages, a focused crawler analyzes the frontier of the crawled region to visit only the portion of the Web that contains relevant Web pages, and at the same time, try to skip irrelevant regions. In this paper,...

متن کامل

Distributed Web Crawling Using Network Coordinates

In this report we will outline the relevant background research, the design, the implementation and the evaluation of a distributed web crawler. Our system is innovative in that it assigns Euclidean coordinates to crawlers and web servers such that the distances in the space give an accurate prediction of download times. We will demonstrate that our method gives the crawler the ability to adapt...

متن کامل

Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies

Compared to the standard web search engines, focused crawlers yield good recall as well as good precision by restricting themselves to a limited domain. In this paper, we do not introduce another focused crawler, but we introduce a generic framework for focused crawling consisting of two major components: (1) Specification of the user interest and measuring the resulting relevance of a given we...

متن کامل

Collaborative Web Crawler over High-speed Research Network

This paper proposes an idea for constructing a distributed web crawler by utilizing existing high-speed research networks. This is an initial effort of the Web Language Engineering (WLE) project which investigates techniques in processing the languages found in published web documents. In this paper, we focus on designing a geographically distributed web crawler. Multiple crawlers work collabor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Designing and Implementation of "regional Crawler" as a New Strategy for Crawling the Web

نویسندگان

چکیده

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

A Novel Method for Crawler in Domain-specific Search

Distributed Web Crawling Using Network Coordinates

Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies

Collaborative Web Crawler over High-speed Research Network

عنوان ژورنال:

اشتراک گذاری